课程主页:https://web.stanford.edu/class/archive/cs/cs224n/cs224n.1194/

视频地址:https://www.bilibili.com/video/av46216519?from=search&seid=13229282510647565239

这里回顾CS224N Assignment 3的内容,参考资料如下:

https://github.com/ZacBi/CS224n-2019-solutions

1.Machine Learning & Neural Networks

(a)

(i)

该更新方式实际上计算了梯度的加权和,所以不会变化太大;低方差可以减少震荡的情形。

(ii)

梯度小的地方会得到更大的更新,梯度大的地方会得到更小的更新,该方法使得各个方向的更新幅度比较接近,可以减少震荡的情形。

(b)

(i)

所以

(ii)

训练时使用dropout是为了探索更多网络结构,增加模型的泛化性;评估时需要一个准确的结果,所以不使用dropout。

2. Neural Transition-Based Dependency Parsing

(a)

Stack Buffer New dependency Transition
[ROOT] [I, parsed, this, sentence, correctly] Initial Configuration
[ROOT, I] [parsed, this, sentence, correctly] SHIFT
[ROOT, I, parsed] [this, sentence, correctly] SHIFT
[ROOT, parsed] [this, sentence, correctly] parsed$\to$I LEFT-ARC
[ROOT, parsed,this] [sentence, correctly] SHIFT
[ROOT, parsed,this,sentence] [correctly] SHIFT
[ROOT, parsed,sentence] [correctly] sentence$\to$this LEFT-ARC
[ROOT, parsed] [correctly] parsed$\to$sentence RIGHT-ARC
[ROOT, parsed,correctly] [] SHIFT
[ROOT, parsed] [] parsed$\to$correctly RIGHT-ARC
[ROOT] [] ROOT$\to$parsed RIGHT-ARC

(b)

$O(n)$,因为一共要Shift $n$次,然后生成ARC也要$n$次。

(c)

init

### YOUR CODE HERE (3 Lines)
### Your code should initialize the following fields:
###     self.stack: The current stack represented as a list with the top of the stack as the
###                 last element of the list.
###     self.buffer: The current buffer represented as a list with the first item on the
###                  buffer as the first item of the list
###     self.dependencies: The list of dependencies produced so far. Represented as a list of
###             tuples where each tuple is of the form (head, dependent).
###             Order for this list doesn't matter.
###
### Note: The root token should be represented with the string "ROOT"
###
self.stack = ["ROOT"]
self.buffer = copy.deepcopy(sentence)
self.dependencies = []


### END YOUR CODE

parse step

### YOUR CODE HERE (~7-10 Lines)
### TODO:
###     Implement a single parsing step, i.e. the logic for the following as
###     described in the pdf handout:
###         1. Shift
###         2. Left Arc
###         3. Right Arc
if transition == "S":
word = self.buffer.pop(0)
self.stack.append(word)
elif transition == "LA":
self.dependencies.append((self.stack[-1], self.stack[-2]))
self.stack.pop(-2)
else:
self.dependencies.append((self.stack[-2], self.stack[-1]))
self.stack.pop(-1)

### END YOUR CODE

(d)

minibatch parse

### YOUR CODE HERE (~8-10 Lines)
### TODO:
###     Implement the minibatch parse algorithm as described in the pdf handout
###
###     Note: A shallow copy (as denoted in the PDF) can be made with the "=" sign in python, e.g.
###                 unfinished_parses = partial_parses[:].
###             Here `unfinished_parses` is a shallow copy of `partial_parses`.
###             In Python, a shallow copied list like `unfinished_parses` does not contain new instances
###             of the object stored in `partial_parses`. Rather both lists refer to the same objects.
###             In our case, `partial_parses` contains a list of partial parses. `unfinished_parses`
###             contains references to the same objects. Thus, you should NOT use the `del` operator
###             to remove objects from the `unfinished_parses` list. This will free the underlying memory that
###             is being accessed by `partial_parses` and may cause your code to crash.
partial_parses = [PartialParse(sentence) for sentence in sentences]
unfinished_parses = partial_parses[:]
n = len(unfinished_parses)

while (n > 0):
    l = min(n, batch_size)
    transitions = model.predict(unfinished_parses[:l])
    for parse, trans in zip(unfinished_parses[:l], transitions):
        parse.parse_step(trans)
        if len(parse.stack) == 1:
            unfinished_parses.remove(parse)
            n -= 1
dependencies = [partial_parses.dependencies for partial_parses in partial_parses]

### END YOUR CODE

(e)

init

self.embed_to_hidden = nn.Linear(self.n_features * self.embed_size, self.hidden_size)
nn.init.xavier_uniform_(self.embed_to_hidden.weight)
self.dropout = nn.Dropout(self.dropout_prob)
self.hidden_to_logits = nn.Linear(self.hidden_size, self.n_classes)
nn.init.xavier_uniform_(self.hidden_to_logits.weight)

embedding_lookup

x = self.pretrained_embeddings(t)
x = x.view(x.size()[0], -1)

forward

embeddings = self.embedding_lookup(t)
hidden = self.embed_to_hidden(embeddings)
hidden = nn.ReLU()(hidden)
hidden = self.dropout(hidden)
logits = self.hidden_to_logits(hidden)

train

optimizer = optim.Adam(parser.model.parameters(), lr=lr)
loss_func = nn.CrossEntropyLoss()

train_for_epoch

logits = parser.model.forward(train_x)
loss = loss_func(logits, train_y)
loss.backward() 
optimizer.step()

计算结果如下

dev UAS: 88.38

test UAS: 88.90

(f)

略过